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In the digital world, classifying real sensed data in huge volumes derived 
from numerical problems is a challenging task due to the computational 
complexity of the metaheuristic searching process. The deep learning 
approach includes convolutional neural network (CNN), long short-term 
memory (LSTM), and Bidirectional (BI)-LSTM, suitable for an optimistic 
processing time of analyzing XML datasets (i.e., social media, trade center, 
and surveillance data exchanged in the internet world). However, it faces 
process deviation when datasets extend their range beyond the expected 
volume. This paper proposes a novel deep learning formwork referred to as 
archimed improved numerical optimization deep learning (AINODL) to 
improve the classification of XML datasets. The proposed AINODL 
framework first extracts feature from XML documents using the vector 
space model. Secondly, it classifies the XML data using the inbuilt function 
of the AINODL framework. The experiments demonstrate that the 


performance parameters accuracy (90%), sensitivity (93%), and specificity 
(94%) of the proposed AINODL framework are significantly enhanced 
compared with the existing approaches CNN, LSTM, and BI-LSTM. 
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1. INTRODUCTION 

Nowadays, the data is analyzed with different aspects to improve business growth. The numerical 
problems are implemented in massive data to explore the various features of the data set. The XML data 
plays an essential role in a lot of applications in business intelligence. The machine learning technique is the 
program that takes the data given by the user and analyzes it automatically based on the model built in the 
program. The open challenge in the business is generating complete accuracy in the domain of data mining. It 
extracts the user-required data from colossal data volume and enhances the process by using an innovative 
technique that benefits the business layers. Most of the web applications are based on the XML model. It 
follows a hierarchical base structure format for analyzing the given data [1]. In this, the data classification 
process is more complex than in the other data models. The query is processed among more than one XML 
data set, forming the integrity operation among the data set [2]. Later, it was done based on the semantics 
information. The deep learning process is the machine learning technique explicit about performing multiple 
interconnections with more than one hidden layer (i.e., the input of each hidden layer is the previous layer's 
output) to achieve the required process. Therefore, it needs optimization to retain the business value by 
protecting the application layer from external attacks. Such attacks are classified based on the vulnerabilities 
[3], [4]. The Spatio-temporal XML documents give the spatial information, which is further updated, 
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inserted, and deleted based on fuzzy spatial-temporal fuzzy structure [5]. The XML data-based various web 

applications are classified in [6], [7]. Subsequently, the semantics of the XML data is analyzed, and 

disambiguous data is generated sphere neighborhood [8] methods. 
Similarly, XML data is mapped in the real-time application with suitable matches [9], [10]. The 

XML data is classified based on the concept and context based on the information [11]-[13]. The basic 

structure of deep learning is used in network intrusion detection with the various weight parameters between 

hidden layers. More than once, it has undergone real-time application [14], [15]. The traditional machine 
learning approach for classification and clustering is explained in [16], [17]. The efficient classification based 
on without features is discussed in [18]-[20]. The classification technique kernel principal component 
analysis-kernel extreme learning machine (KPCA-KELM) technique is proposed to fuzzy model XML data. 

The accuracy and training time is evaluated and compared with the existing technique Extreme Learning 

Machine (ELM). The various classification technique is introduced [21]-[26] in XML data. The contribution 

of this paper is given as shown in: 

— To provide easy access to different XML datasets (social media, trade center, and surveillance data) to 
extract silent features points using vector space model and effectively classify XML datasets into other 
classes. 

— To identify the intensity of feature depth and correctly classify by using the proposed archimed improved 
numerical optimization deep learning (AINODL) framework. It is continuously operated until sufficient 
knowledge base information is acquired from the given dataset. 

— To improve the performance of the training phase by knowledge gathering silent fractures by effectively 
calculating the adaptive parameter such as Flscore, precision, specificity, sensitivity, FPR, and accuracy. 

In the remaining section of the paper, section 2 describes the classification of XML documents from 
related work. The proposed AINODL framework is discussed in section 3. The result and experimental setup 

are presented in section 4. Finally, the conclusion of the research is given in section 5. 


2. RELATED WORK 

In this section, we look at the work of several authors who use deep learning frameworks such as 
convolutional neural network (CNN) [27], extreme learning machine (ELM), and long short-term memory 
network (LSTM) to perform a hierarchical classification of XML data. The success of a new method for 
categorising XML data relies heavily on a well-designed optimization algorithm. This is due to the best 
features can be gleaned from the acquired XML database. For this reason, the XML data classification issue 
is tackled by considering the merits and drawbacks of a variety of optimization algorithms discussed in a 
hierarchical fashion as follows: 


2.1. Archimedes optimization algorithm (AOA) 

It is also referred to as the population-based technique, which follows Archimedes’ principle [11]. It 
evaluates the statistical significance, convergence ability, exploitation-exploration ratio, and the diversity of 
analysis of alternatives (AOA) solutions. This technique uses neither the object fully nor partially depending 
on the classification feature. The object state is measured by (1). 


G, = Wp, = DVA () 


Where 'G,'is the gravity of the object, and 'W,'what is the weight of the object. It may also be 
equivalent to likelihood neighbor’s factors such as is the density, 'V’is the volume, and is the acceleration of 
the object. This method, can improve the optimization performance on the highly non-linear problem in the 
volume of data. It supports the specific issue of optimal classification based on whether the object is located 
upward or downward depending on the current thing of the location. Thereby, each data point reaches the 
optimal location in the object. As a result, the numeric significance, convergence ability, exploitation- 
exploration ratio is generated by using the population. However, it failed to explore more fundamental 
problems in the world. Luan and Lin [12] the performance of the sensitivity parameter is analyzed of the 
given data. It involved various artificial intelligence techniques to evaluate more than sensitivity parameters 
that may occur in the actual data in the business application. 


2.2. CNN and LSTM 

Shone et al. [13] addressed the improvement of error backpropagation network through the variety 
of multi-layer neural networks. The features are extracted by high-level phrase sentences when applied to 
decoding facial recognition in which analyzing any type of documents, historical data collection, 
environmental information gathering, grey areas, advertising, and medical diagnosis problems. The non- 
linear activation function is applied to the operation of the convolution. The whole connection is used for the 
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classification. The data features are extracted by the kernel function, which is specified based on the 
classification approach. Thereby, it tracks the location point's frame to follow the object's lower and upper 
movement in the specified location. Consequently, long short-term memory (LSTM) is a type of recurrent 
neural network (RNN). It follows the principle of long-term dependency of the object and updates the 
information of the object based on the current location of the cell in the state. This technique follows three 
stages. i) Forget stage: which decides the information to be deleted based on the previous updating of the 
object, ii) Second stage: the information to be used in the classification is generated, and iii) Third stage: 
output stage decides which information is to be updated. 


[nr = o(Wr, [Ai,_1,xr%] + brr) (2) 
ing = O(Win[Ai¢_1, x7] + Din) (3) 
Op, = o(WyplAig_1, x71] a bop) (4) 


Where f7;is the forgotten state of the object at the level t, ais the sigmoid function, by, the bias of 
the forget state, in, input of the current state, W;,weight of the input state, b;,discrimination of the input 
state, op;the output state of the object at the state t, Wpis the weight of the object at the output stage t, 
Ai,_,is the hidden value of the object at the previous level t-1, and xr,is the current input of the 
classification. The performance parameters precision, recall, and F-score are evaluated, and the results are 
analyzed. 


2.3. ELM 

It is a technique used in the earliest classification of XML documents [2]. The features are extracted 
using a vector space model. XML data is classified based on the features, and the given data set evaluates its 
classification performance. This technique follows a single hidden layer feed-forward neural network. The 
learning speed is fast in this network. The performance of the classification also significantly increases 
compared with the traditional neural network. Let there be M random samples (xi, t;) er**!. Therefore, the 
ELM model is represented by: 


f(2)=¥B,6,(x,) (5) 


where 'H'represents the number of hidden layer nodes, G,the activation function, and the weight vector 
between the ith node and output nodes in the model. The main drawback of this technique is that matrix 
operations can calculate the output weights without iteratively tuning the weights. 


3. PROPOSED AINODL FRAMEWORK 
The proposed AINODL framework consists of three essential steps: 1) feature extraction of the user 
data, ii) improved Bi-LSTM deep learning method for classification process, and iii) the performance 
evaluation module. The overall system design of the proposed AINODL framework is given in Figure 1. 
The detailed description of the proposed AINODL framework is given as: 
— Step 1: Initialize the position of all objects, which is used to find the optimal location of the thing. The 
primary objective function variable fo}; is mathematically computed by: 


fooj = (U —L) + b*rand*m (6) 


where, ‘U’is the upper boundary of the search, 'L’is the lower boundary of the search agent, rand is the 
dimensional variable generated between 0 and 1, 'b’is the bias value, and ’m'is the number of features. 
— Step 2: Compute the density, volume, and acceleration for every iteration and update all object values 
dynamically. Acceleration is the mathematical model to identify the best object in the set of entities. It 
is denoted by a: 
m+1 _ _4-(@) 


%  * @-@ 


+rand*m (7) 


where ’a’is accelerating the object to find the best movement of the object j. The value of density and 
dimension is defined between 0 and 1. In each iteration m the density, and dimension are varied linearly 
every iteration. Density and volume are computed by: 
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dj*? = dm + rand * (d, — dj") *b (8) 


vit? = vm + rand * (vy - i”) «Db (9) 


In the beginning the collision occurs in the object. After some time the thing tries to reach the stable 
state. The stable state of the object is defined by the term transfer object (TO). 


m-—i ter 


TO =exp exp ( ) *rand (10) 


iter 
Where, ‘i ter’ is the number of iterations of the proposed algorithm. The density value is decreased 
based on the value of'i ter’. Neighbors assist this value in the finding of best search optimal location is 
given by: 


m+1 — Miter~™m m 

d =exp exp ( Fi ) (<) *m«b (11) 
this leads to achieving an appropriate balance between exploration and exploitation. 

— Step 3: Update the object moving direction. The following manipulation updates the object moving 
direction. 


Poosition = 4* rand *« d * b (12) 
Where ‘d’is the control variable that is set as 0.4. 

— Step 4: Tuning the object using E-LSTM. In the proposed model the deep learning model E-LSTM is 
tuned to improve the performance of the classification process in the input of XML data. In general, the 
classification performance is based on parameters that are used in the implementation. The parameters 
of the E-LSTM are the number of layers (nl), number of hidden neurons of E-LSTM (nn), batch size 
(bs), epoch size (es), Learning rate (Ir). The mentioned parameters are significantly utilized to improve 
the classification process. 

— Step 5: Performance evaluation. The performance of the proposed algorithm is evaluated by accuracy, 
sensitivity, specificity, precision, FPR, and Fl Score. Accuracy is the percentage of the data which is 
predicted correctly in the algorithm. Sensitivity generates an actual positive recognition rate in the 
algorithm. This is calculated by total number of true positives and the total number of false negatives. 
Sensitivity gives a real negative recognition rate in the algorithm. This is calculated by total number of 
true negatives and a total number of false positives. Precision is the value required information rate 
among relevant objects among the retrieved objects. Fl score is the average weight of Precision and 


sensitivity in the technique which is used. 


Feature 
Extraction 


Set the Primary 
Parameters 


(Update Population 
value 


Turning 
Model 


Improve Bi-LSTM 
function 
(Deep Learning Method) 


Performance 
Evaluation 


Figure 1. Overall system design of the proposed model 


3.1. Pseudo code of proposed AINODL framework 
The Algorithm 1 has the input as an XML document. The primary parameters are initialized L 
(lower bound), U (upper bound), C3, C4 control variable to improve the performance of the classification. 
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The iteration is executing until to reach the robust performance by updating acceleration (a), density (d) and 
velacity (v) of the object with optimal value. The manipulation of a, d and v depends on the previous object 
position in the current execution. 

Figure 2 shows the flow diagram of the proposed AINODL framework for XML data classification. 
It is consists of one root element and more than one child element in the given document. These elements are 
extracted using a vector space model. It uses the ke Gaussianssian function to remove the details from the 
hierarchical structure and identify the path of each element from the root element. The number of features is 
generated based on the derived classification. The required parameters are initialized based on the design of 
the algorithm such as density (d), velocity (v), acceleration (v), and E-LSTM parameters (number of layers, 
learning rate). Identify whether the generated parameters are sufficient to improve the performance of the 
classification. Compute fitness objective function TO. If TO is greater than 1, compute the values of d, a, v 
using (7), (8) and (9) respectively. Otherwise, compute this using LSTM numerical computation. The 
primary parameters are updated based on the number of iterations. 


Input XML. 
document 
Y 
Extract XML 
elements 
¥ 
Extract class 
feature 
y 
Initialize 
parameters 
Y 
Generate a, d, v 
values 
¥ 
Verify the 
generated 
¥ 


Compute the 
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if TPS! » using enhanced 
AOA-MRF 


Update a, d, v 
using E-LSTM 
computation 
¥ 


Update Primary 
parameters 
¥ 
Display 
performance 
parameters 


Figure 2. Flow diagram of the proposed AINODL framework 


Algorithm 1. AINODL 

A Input: XML document 

2 Output: Metrics of Performance parameters 

3 Initiation of parameters: iter,U,L,xr,,B;,G;,m, d, D, SA, fopj,dmin 


4 L>-1.5: Lower 

5 U>1.5: Upper 

6 C3=30.1;C4=>0..1 

7 Set the class label C 

8 function call AOAMRF (i ter, U,L,xr;,m, D, SA, fonj,C 3, C4, Xtests Veest) 
9 C1=0.5,C2=0.5 

#10 U=0.3,1=0.5 

#11 Compute a,d&v from (7), (8), and (9), respectively 
#12 for t = 1: iter 
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13. TO=exp(((t-iter) / (iter))); 
14 if TO>1 

5 TO=1; 

16 end if 

17. Update a,d&vvalues 

18 end for 

19 for k = 1: dim 

20 T=TO*C3 

21 if T>1 

22 T=1; 

23 end if 

24 Again, update a,d&vvalues 


This article uses the dataset reed.xml to analyze the performance evaluation of the classification 
technique. The sample XML document element is shown in Figure 3. The root element is root. This root 
element approach has the children element of course. This course element has more than one sub-element 
such as reg_num, subj, crse, sect, title, units, instruction etc. This article uses the ttttttttttt as features to 
perform the classification operation. The path of the unit element in Figure 4 is root\course\units. 


<root> 

‘<course> 

<reg_num>20573</<reg_num> 

<subj>ANTH</subj> 

<crse>344</crse> 

<sect>S01</sect> 

<title>Sex and Gender</title> 

<units>1.0</units> 

<instructor>Makley</instructor> 

<days>T-Th</days> 

<time> 
<start_time>10.30AM</start_time> 
<end_time>11.5S0AM</end_time> 

</time> 

<place> 
<building>VOLUME~</building> 
<room>120</room> 

</place> 

‘</course> 


Figure 3. Sample XML elements 
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Figure 4. Performance F1 Score comparison between the existing and proposed model 


4. EXPERIMENTAL SET-UP AND RESULTS 

The experiments are demonstrated in the platform WINDOWS 7/10 with a minimum configuration 
of 4GB RAM. The proposed AINODL framework is implemented in the MATLAB R2019a version. The 
dataset reed.xml is used in the performance evaluation. Table 1 gives the overall result generated by the 
proposed AINODL framework and the existing technique convolutional neural network (CNN), long short- 
term memory (LSTM), and Bidirectional (BI)-LSTM. The performance parameters computation is referred to 
from [6]. This article uses the element units to complete the classification of the dataset reed.xml dataset. 

Figure 4 shows the performance comparison of the Fl score parameter of the proposed AINODL 
framework compared with existing techniques CNN, BI-LSTM, LSTM respectively. Usually, a high F1 score 
is mandatory for any algorithm. In this paper, the proposed AINODL framework generates robust Flscore 
value of 89% is significantly increased as compared with CNN (76%), BI-LSTM (77%), and LSTM (77%). 
Subsequently, the precision value reached out high by the proposed AINODL framework. Figure 5 shows the 
performance comparison of the precision value of the proposed AINODL framework compared with existing 
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techniques CNN, BI-LSTM, LSTM respectively. It is observed that the proposed AINODL framework 
produces a better result than the current techniques. That means nearly 45% improvement has been achieved 
as compared to LSTM. Similarly, 20% and 10% increment from other two methods BI-LSTM and CNN 
respectively. On the other hand, the specificity parameter also should maintain as high as possible. The 
proposed AINODL framework carries it out. That is, it has retained the maximum specificity of (94%) which 
is reasonably good as compared with CNN (93%), BI-LSTM (89%), and LSTM (77%). It is shown in Figure 
6. Consequently, Figure 7 seems that the sensitivity of the proposed AINODL framework has produced a 
significantly better result than the existing technique. That implies nearly 30% improvement has been 
achieved as compared to LSTM. Similarly, 15% and 12% increments from the other two methods BI-LSTM 
and CNN, respectively. In addition to this, for practical processing, FPR value is maintained as low as 
possible. Based on the result obtained for all four techniques, the proposed AINODL framework produces 
less FPR rate than the existing technique. Thus, the proposed AINODL framework reduces the FPR value to 
(5%), which is significantly as low as compared with CNN (6%), BI-LSTM (10%), and LSTM (22%). It is 
shown in Figure 8. The performance comparison of accuracy between the existing and the proposed 
technique is shown in Figure 9. The accuracy of the proposed AINODL framework, nearly, 50% 
improvement has been achieved as compared to LSTM. Similarly, 2% and 12% increments from the other 
two techniques BI-LSTM and CNN, respectively. It is proven that the proposed AINODL framework 
significantly yields better results in all the parameters such as Flscore, precision, specificity, sensitivity, 
FPR, and accuracy than the existing techniques CNN, BI-LSTM, LSTM, respectively. 


Table 1. Performance parameter results between existing and proposed technique 
Performance parameters 
Accuracy (%) Sensitivity (%) Specificity (%) Precision (%) | FPR(%) FI score (%) 


Technique information 


CNN 89 74 93 81 6 76 
Existing technique Bi-LSTM 78 77 89 77 10 77 
LSTM 52 62 77 54 22 53 
Proposed technique —~ AINODL 90 93 94 91 3 89 
Precision Comparison between Specificity Comparison between 
existing and proposed technique existing and proposed technique 
100 100 
80 — 80 
cS = 
= 60 - £ 60 
3 40 % 40 
. 20 + a 20 
0 0 
Bi-LSTM AINODL Bi-LSTM AINODL 
Technique Name Technique Name 
Figure 5. Performance of precision comparison Figure 6. Performance of specificity comparison 
between the existing and proposed model between the existing and proposed model 
Sensitivity Comparison between FPR Comparison between existing 
existing and proposed technique and proposed technique 
100 25 
— 80 20 
x 
£ 60 x 15 
- Tt | i | I . 
a 20 
, = a 
Bi-LSTM AINODL Bi-LSTM AINODL 
Technique Name Technique Name 
Figure 7. Performance of sensitivity comparison Figure 8. Performance of FPR comparison between 
between the existing and proposed model the existing and proposed model 
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Figure 9. Performance of accuracy comparison between the existing and proposed model 


CONCLUSION 
To obtain a robust performance of the fuzzy XML document, the technique AINODL has been 


proposed in this study. First, the tree structure of the XML document has been generated using parsing 
method. Then the feature has been extracted using the Kernel PCA model. The technique AINODL has been 


able 


to classify the XML data based on the extracted features of the XML document. The various 


performance parameters have been evaluated by using the AINODL classification technique. The results of 
the performance parameters accuracy (90%), sensitivity (93%), and specificity (94%) of the proposed 
AINODL framework are significantly enhanced compared with the existing approaches CNN, LSTM, and 


BLLSTM. 
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